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Abstract. There have been several recent articles studying homology of var- 
ious types of random simplicial complexes. Several theorems have concerned 
thresholds for vanishing of homology, and in some cases expectations of the 
Betti numbers. However little seems known so far about limiting distributions 
of random Betti numbers. 

In this article we establish Poisson and normal approximation theorems 
for Betti numbers of different kinds of random simplicial complex: Erdos- 
Renyi random clique complexes, random Vietoris-Rips complexes, and random 
Ccch complexes. These results may be of practical interest in topological data 
analysis. 



1. Introduction 

Several papers have recently appeared concerning the topology of random sim- 
plicial complexes [13 HJ HI [II [M IS] • The results so far identify thresholds 
for vanishing of homology, or compute the expectation of the Betti numbers E[/3&] 
(i.e. the expected rank of these groups). In this article we prove Poisson and nor- 
mal approximation theorems for /3k for three models of random simplicial complex. 
The complexes themselves are defined precisely and given further motivation in the 
following sections but we first outline our results. 

The first model considered is that of the Erdos-Renyi random clique complex 
X(n,p), a higher dimensional analogue of the Erdos-Renyi random graph G(n,p). 
It was shown in [11] that for each k and a certain range of p = p(n), j3k ^ asymp- 
totically almost surely (a.a.s), and in this regime, a formula for the asymptotic 
size of E[/3fe]in terms of p is given. (Outside of this regime it is conjectured that 
/3k = a.a.s. and some evidence for the conjecture is given in |llj.) Here we prove 
a Central Limit Theorem for /3k- That is, we show that 

^Ux(o,i), 

\/VarW 

as n — > oo, where N(0, 1) is the normal distribution with mean and variance 1. 

The second model considered is the random Cech complex. This model is a 
higher-dimensional analog of the random geometric graph; the underlying graph is 
a random geometric graph and the presence of (k — l)-dimensional faces is deter- 
mined by fc-fold intersections of balls centered about the vertices. Cech complexes 
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Figure 1. The Betti numbers of X(n,p) plotted vertically against 
edge probability p; in this example n — 100. Computation and 
graphic courtesy of Afra Zomorodian. 

are homotopy equivalent to Edclsbrunner and Miicke's alpha shapes, widely ap- 
plied in computational geometry and topology [6]. The analysis needed to obtain 
limit theorems for the Betti numbers of random Cech complexes is more subtle 
that what is needed for the Erdos-Renyi model; to prove the normal and Poisson 
approximation theorems we must first establish limit theorems for certain hyper- 
graph counts, extending some of Mathew Penrose's results for subgraph counts for 
geometric random graphs |15j . 

The final type of complex considered is the random Vietoris-Rips complex, de- 
noted VR(n, r). This is similar to the random Cech complex; the construction is 
to take the clique complex of a random geometric graph. (A useful reference for 
geometric random graphs is |15j.) The topology is very different than for the clique 
complex of the Erdos-Renyi random graph; for the contrast between X(n,p) and 
VR(n, r) see Figures [I] and [2j The analysis needed to obtain limit theorems for the 
Betti numbers of VR(n, r) is nevertheless essentially identical to that needed for the 
random Cech complex. A minor example of this fact is that in both cases, since /3q 
counts the number of connected components for the Cech and Rips complexes, /?o is 
actually the same in each of these cases and is equal to the number of components 
of the random geometric graph. This has already been treated in detail by Penrose 
[15j . and so when convenient we will restrict attention to for k > 1. 

The techniques throughout the paper are a combination of inequalities derived 
from combinatorial and topological considerations with Stein's method. (For an 
introduction to topological combinatorics see [5j ; for a survey of Stein's method in 
proving Poisson approximation theorems see [5], and for an introduction to Stein's 
method for normal approximation, see |17j.) 

1.1. Notation and conventions. Throughout this article, we use Bachmann- 
Landau big-O, little-O, and related notations. In particular, for non-negative func- 
tions g and h, we write the following. 
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• g{n) = 0(h(n)) means that there exists no and k such that for n > no, we 
have that g(n) < k ■ h(n). (i.e. g is asymptotically bounded above by h, 
up to a constant factor.) 

• gin) — Q(h(n)) means that there exists no and k such that for n > no, we 
have that g(n) > k ■ h(n). (i.e. g is asymptotically bounded below by h, 
up to a constant factor.) 

• g(n) = Q(h(n)) means that g(n) = 0{h{n)) and g(n) = il(h(n)). (i.e. g is 
asymptotically bounded above and below by h, up to constant factors.) 

• g{n) = o{h(n)) means that for every e > 0, there exists no such that 
for n > no, we have that g{n) < e • h(n). (i.e. g is dominated by h 
asymptotically. ) 

• 9( n ) — oj{h(n)) means that for every k > 0, there exists n such that for 
n > no, we have that g(n) > k ■ h(n). (i.e. g dominates h asymptotically.) 

We may also write A n ~ B n if lim„_ J . 00 ^ = 1, and A n < B n if there is a constant 
c such that A n < cB n for all n. 

A sequence {Xn}^^ of random variables is said to converge weakly to a limit- 
ing random variable X (written X n X) if limn^oo E[f(X n )] = E[/(X)] for all 
bounded continuous functions / (there are several other equivalent definitions). 

The total variation distance between random variables X and Y is defined by 

d TV {X,Y) :=sup|E[/(X)]-E[/(y)]|, 
/ 

with the suprcmum taken over all continuous functions bounded by one. Clearly, 
if dxv(X n , X) -> as ji -> oo, then X n =4> X; however, the topology induced by 
the total variation distance is stronger than the topology of weak convergence. 

The Li-Wasserstein distance or Kantorovich-Rubenstein distance between X and 
Y is defined by 

d x {X,Y) :=sup|E[/(X)]-E[/(r)]|, 
/ 

where the supremum is over all functions / with sup^j, ^r§E^| ^ 1- This dis- 
tance also induces a topology stronger than the topology of weak convergence. 

Finally, the normal distribution with mean u and variance a 2 is denoted N(//, a 2 ), 
and the distribution function of the standard normal distribution is denoted &(t). 

2. Erdos-Renyi random clique complexes 

Perhaps the first type of random simplicial complex studied was the 1-dimensional 
version studied by Erdos and Renyi [7]. 

Definition 2.1. The Erdos-Renyi random graph G(n,p) is the probability space of 
all graphs on vertex set [n] = {1,2, ... ,n} with each edge included independently 
with probability p. 

The "clique complex" is used to generalize G(n,p) from graphs to higher dimen- 
sional simplicial complexes. 

Definition 2.2. The clique complex X(H) of a graph H is a the simplicial com- 
plex with vertex set V(H) and a face for each set of vertices spanning a complete 
subgraph of H. 
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In other words, the clique complex X(H) of a graph H is the maximal simplicial 
complex with 1-skeleton H. 

This section concerns the clique complex of the Erdos-Renyi random graph, i.e. 
X(G(n,p)). For simplicity in notation, this is denoted X(n,p). 

There are several motivations for using X(n,p) as a model of a random simpli- 
cial complex. One motivation is that X(n,p) provides a natural higher-dimensional 
generalization of G(n,p), which has proved extremely useful in graph theory as 
well as in applications. (Other higher-dimensional generalizations are studied in 
[21 [TJl E3]-) Another motivation comes from the fact that every simplicial com- 
plex is homeomorphic to the clique complex of some graph (e.g. by barycentric 
subdivision) [8]. 

One interesting feature of X(n,p) is that it provides homological analogues of 
the Erdos-Renyi theorem, but in a non-monotone setting: If edges are added at 
random to an empty graph, the Erdos-Renyi theorem characterizes the number of 
edges needed before the graph becomes connected. Connectivity is a monotone 
graph property - if one adds edges to a connected graph, it is still connected. 

Topologically, connectivity is equivalent to a statement about zeroth homology 
H (G(n,p)) but if one asks about Hk(X(n,p)), k > 0, there is a problem - adding 
edges generates higher fc-dimensional faces and (k + l)-dimensional faces at the 
same time. Since generators and relations are both being added, there is no reason 
that things have to behave in a monotone way. In fact, it is not just that things 
might not be monotone; they are non-monotone in an essential way. In particular, 
there seem to be two thresholds for higher homology - one where passes from 
vanishing to non- vanishing, and another where it passes back to vanishing. 

The following theorem was proved in [11) . For any fixed k > 0, let (3k denote the 
dimension of fcth homology, i.e. (3k = dim[iifc(A, Q)]. 

Theorem 2.3. If p = ^(n^ 1 ^) and p = o(n _1 /( fc+1 )) then 

y E[f3 k (X(n,p))] 1 

(In [llj explicit nontrivial homology classes arc exhibited, and several partial 
I are proved; in particular it is shown that if p — 0(n — e ) 



2.3 



converses of Theorem 

or p = i7(n -1 /' 2fc+1 ) +e ) for some constant e > 0, then a.a.s. (3k = 0. 

The remainder of this section is devoted to showing that in the same regime, (3k 
obeys a central limit theorem. 

Theorem 2.4. If p = uj(n~ 1 / k ) and p — o(n~ 1 / ( - k+1 ^) then 
(3 k (X(n,p))-E\^_(X(n,p))} ^ 
v/VarW 

Proof. For a finite simplicial complex A, let /i(A) (or simply /j if context is clear) 
denote the number of i-dimensional faces of A. A useful fact when proving Theo- 
rems [23] and [2^4] is that (3k satisfies the following "Morse" inequalities: 

(1) -/ fc -i + /fc-A +1 </3k</k, 

for all k. These inequalities follow from the definition of simplicial homology and 
the rank-nullity law [S] . 

The next observation to make is that X{n,p) is a clique complex, so fk counts 
the number of (fc + l)-cliques. Since there are ( k 1i) possible (k + l)-cliques and 
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( k + 1 \ 

each appears with probability 2 ) ; 

E[/ fc ] 



lim 

n-voo nk+lpCV) (k + l)\ 



If p = uj(n 1 / fc ) then 



fc-i 



and the same argument shows that if p = o(n 1 /( fc + 1 )) then 



E[/fc] 



o(l). 



That is, in the regime of Theorems |2.3| and |2.4 

E[/fc] 



lim 



n^oo E[— +fk — fk+l] 

which, in light of ([lj, reproves Theorem 2.3 



Let fk := —fk-i + /fc — /fc+i- The following claim together with ([!]) is used to 
show that Pk satisfies a central limit theorem. 



Claim 2.5. 

(i) 

(ii) 
(iii) 



/ Var(/ fc ) 

For t G R, it follows from ([I]) that 



lim = 1. 

n-s-oo Var(/fc) 



fk - E[A] 
/Vax(7^) 

/fc - E[/ fc ] 



N(0, 1) asn^oo. 



N(0,1) as 



P 



A- - Ej/fc] 
v/VarC/fc) 



< t 



< P 



Pk - E[f k ] 



< t 



< P 



fk - Ej/fc] 
v/VarCA) 



< i 



VVar(A) 

The left-hand side tends to $(t) as n — > 00 by part |(ii)| of the claim. For the 
right-hand side, let e > and observe that 

(2) 



P 



< t 



< p 



fk - E[/ fc ] 



< t - e 



Var(/ fe ) 

/fc - Ej/fc] 
yVar(7y 



P 



/fc - E[/ fc ] / fe - E[ 



v/Var(/ fe ) VVar(/ fe ) 



> e 



< t 



fk - E[/ fe ] 



Var(/fc) 



< e 
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Now, it follows from part (iii) of the claim that the first term of the right-hand 
side of Q tends to &(t — e) and that the last is asymptotically bounded above by 
+ e) — $(t — e). For the second term, first require n to be large enough that 



E[/ fc 



e 

<2' 



V^iM ^Var(/ fc ) 
This condition together with Chebychev's inequality implies that 

fk - E[/ fc ] f k - Hfk 



Vvar(/ fc ) VVar(A) 









1 


> e 


< P 


/fc 


VVar(/ fc ) 







> 



< 4e" 



which tends to zero for fixed e > by part (i) of the claim. It thus follows that the 
right-hand side of ([2| is asymptotically bounded above by 4>(£ + e) as n — > oo; as 
e is arbitrary, this completes the proof of the central limit theorem for modulo 
proof of the claim. 



To prove part (i) of the claim, first write 

/fc = E 

AC{l,...,n} 
\A\=k+l 

where £4 is the indicator that A spans a face in X(n,p); that is, that A spans 
a complete graph in G(n,p). Then, enumerating pairs of subsets of size k + 1 of 
{1, . . . , n} by the size r of their interesection, 

2 



A,B 



Var(/ fc ) = £ EIUHb] 

s. fc+1 

1 £ 



n 

fc + 1 



r=0 



fc+1 

r 



n — k — 1 
fc + 1 - r 



fc + 1 



pi 2 ; 



Now, it is not hard to see that in the range of p considered here, only the r = 0, 1, 2 
terms contribute in the limit; there is cancellation of the terms of order n k+1 and 
n k , so that the main contribution is in fact from the r — 2 term and 



(3) 



lim n- 2fc p ( - 2 (^ 1 ) +1) Var(/ fc ) = c k , 

n— ^00 



for some constant c depending only on fc. From this it follows immediately that 
Var(/ fc _i) Var(/ fe+ i) 
"^taT = ^ ^ara^° (1) ' 

for p in the range specified in the statement of the theorem. 
Expanding the same way as above, it is clear that 



Cov(/fc,/fc +1 ) = 



n 

fc + 1 



E 

_r=0 



fc + 1 

r 



n — fc — 1 
fc + 2-r 



-G)_ 



n 1 
.fc + 2, 
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again there is cancellation of the terms of order n k+2 and n k+1 
contribution is from the r = 2 term and 



so that the leading 



lim n 



-2ft— 1 



-((T)+m-i) C ov(/ fc ,/ fc+1 ) = Cfe 



for a (different) constant c& depending only on k. Thus in the range of p being 
considered, 



Cov(/fc,/ fc+ i) 
Var(/ fc ) 

In exactly the same way, one can show that 



= o(l). 



Cov(/k,/fe_i) , . Cov(/fe_i,/fc+i) , 

= o(l) and =-= — ——- = o(l), 



Var(/ fc ) 



Var(/ fc ) 



completing the proof of part (i) of the claim 



The proofs of the second and third parts both follow from an abstract normal ap- 
proximation theorem for dissociated random variables proved (via Stein's method) 
in [3]. Part (ii) is in fact proved there; the following is a a straightforward mod- 
ification of their proof which obtains a ce ntra l limit theorem for the lower bound 
fk . One can also recover the proof of part (ii) from what is given below, simply by 
ignoring the extra terms present in beyond those coming from 

A set {Xj : j = . . . ,j r ) g J} for J a set of r-tuples is dissociated if two sub- 
collections of the random variables {Xj : j g K} and {Xj : j £ L) are independent 
whenever (Uj eK {ji,. . . , >}) n (U jeL {ji, • ■ -,3r}) = 0- Let W := J2jeJ X J' and for 
each j € J, let Lj := {k e J : {fci, . . . , k r } n {ji, . . . ,j r } ^ 0}. That is, L- } is a 
dependency neighborhood for j. If EXj = and EW 2 = 1, then it is shown in [3] 
that 



(4) 



je,/ k.ieLj 



E|X,X k |EpCi| 



where Z is a standard normal random variable. 

To show that fk satisfies a central limit theorem, let the index set J be the 
potential edge sets for complete graphs on k + e (e e {0, 1,2}) vertices in G(n,p); 
that is, an element of J is a ( fc ^ e ) -tuple of edges spanning a given set of k + e 
vertices. Each j € J can thus be associated with its spanning set A- } of vertices. If 
the random variables Xj are defined by 

where a 1 = Var(/fe), then {Xj} are evidently dissociated. 

The second half of the sum from Q is fairly straightforward to bound in this 
context. For each j, partition Lj into the sets Lj of indices whose spanning sets 
have size k + e. Observe that for each j, if ej = \Lj \ — k, then 
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Decomposing as in the variance estimate by the size r of the intersection of A- } and 
A k and using the bound above for \L?\ yields 

EEE ElXjXklElX,! 
je J keif 

J J 

since the r = 2 term yields the top-order contribution in the range of p considered 
here. Moreover, it is easy to check that this expression is maximized for e 3 - = e = 
/ = 1. Combining this estimate with ^ shows that the contribution to the error 
from the second sum is bounded above by 

<7- 3 Cfc n 3fc -y( fe t 1 )- 1 < Cfe ^, 

n 

which tends to zero as n tends to infinity. 

The first half of the sum is bounded similarly, although it requires that the 
intersections of three spanning sets of vertices be considered. Let r denote the 
number of points common to A- } and A k . Letpi := |AjnAin.A£.|, p2 ■= lAjn^inAkl 
and p 3 := |A] n A x n A k \. Then 

where the constant c simply accounts for the fact that the X- } have been centered. 
The number of ways to choose j, k and 1 is 

n \ /fc + eA (n— k — eA fk + ej - 
v fe + ejj \ r J \k + e k - r) \ pi 
r \ /fc + efe — r\ (n — 2k — £j — + r N 
\P2/\ P3 J\k + ei-p 1 ~p 2 ~P3 / 
Combining these two facts, it is perhaps slightly unpleasant but not too hard to see 
that the main contribution to the error arises from the case that r — 2, p\ +P2 = 2 
(in fact only when pi ^ 0), and ej = ey- = e; = 1. It follows that 

V V E|X jM | < a- 3 c fe n 3fe " y(*r)-a < _£* 
jeJk,ieij v 
which also tends to zero as n tends to infinity. This completes the proof of part 



x 



iii) of the claim, finishing the proof of Theorem 2.4 



□ 

3. Random Cech complexes 

The second model of random simplicial complex considered is the random Cech 
complex. This is a higher-dimensional analog of a geometric random graph, con- 
structed explicitly below. In order to analyze this model, we use the same tech- 
niques used by Penrose 15j in his study of subgraph counts of random geometric 
graph. The additional spacial dependence that is inherent in the random variables 
we consider presents an additional technical challenge, and means that Penrose's 
results cannot be applied directly to the problem. 
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Suppose that {X i }°^ l is an i.i.d. sequence of random vectors in R d , with bounded 
density /. Let {r n }^ =1 C R + , such that nr d — — ^> (the so-called "sparse" regime 
of geometric random graphs), and construct a random Cech complex G(Xi, . . . , X n ) 
on {Xi}f =1 as follows. If |JQ — Xj\ < 2r„, put an edge between Xi and Xj; that is, 
the 1-skelcton of the complex is a random geometric graph. More generally, make 
the convex hull of {Xi 1 . . . , Xi k } a face of the complex if the balls of radius r n about 
the points {X^ . . . ,Xi k } have non-trivial intersection. 

Definition 3.1. The points {xi, . . . , Xk} C R d form an empty (k — l)-simplex with 
respect to r if for each j a e {1, . . . , k}, the intersection P| B r (xj) is non-empty, 

i<j<fe 

but the intersection (^| B r (xj) = 0. 

i<j<fe 

Let /i r (a;i, . . . , Xk) be the indicator that {x\ 1 . . . , Xk} form an empty (fc — 1)- 
simplex with respect to r, and for a multiindcx i = («i, . . . , ik) with 1 < i\ < ■ ■ ■ < 
ik < n, let ^ = /i r „ ,...,XjJ. Let 

S n ,k '■= Cii 

i=(»i,...,ijt) 
l<ii <---<ik<n 

that is, S n ,k is the number of empty (k — l)-simplices in G(X\, . . . ,X n ). Another 
object of equal importance in what follows is S nt k, the number of isolated empty k- 
simplcs. That is, if C(h,...,i k ) is the indicator that {X ix , . . . , X ik } form an empty (k— 
l)-simplex with respect to r n and that there are no edges between {^jljG-ju,...,^} 
and {A,| r/| , : |. then 

S n ,k = ^ ] Ci- 

i=(*i, ■■■!**) 
l<il<-<ifc<ri 

The random variables S n ^ and 5„.fe are related to h-i as follows. Firstly, 
Pk-i is bounded below by the number of isolated empty fc-simplices; that is, 
Pk-i{G(Xi, . . . ,X n )) > S Ht k- Furthermore, any contribution to /3fc_i not coming 
from an isolated empty (fc — l)-simplex comes from a component in G(X\, . . . , X n ) 
on at least k + 1 vertices. In order for such a component to contribute to fik-i, 
(k — 2)-dimensional faces. Such faces are necessarily triangulated (by the construc- 
tion of G(Xi, . . . ,X n )), and so any further contribution to /?fc_i contains at least 
one simplex on k — 1 vertices, with either an extra edge attached to each of two dif- 
ferent vertices (terminating in different places), or else an extra path of length two 
attached to one vertex. Let Y n ,k denote the number of simplices in C(Xi, . . . ,X n ) 
on k — 1 vertices with two extra edges attached, counted once for each simplex on 
k — 1 vertices which occurs and for each distinct pair of simplex vertices with an 
extra edge. Similarly, let Z n ^ denote the number of simplices in G{X\, . . . , X n ) on 
k — 1 vertices with at least one extra path of length 2 attached, counted once for 
each simplex which occurs and for each vertex with a path of length two attached. 
The argument above shows that 

(5) S n .k < Pk-2{G(Xi, . . . ,X n )) < S n> k + Yn.k + Z nt k, 

where the trivial bound 5„, fe < S n ^k has also been used. 
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The limiting distribution of /3k- 1 will follow as in the previous section by proving 
the same limit theorems for the upper and lower bounds of ^ . The theorem is the 
following. 

Theorem 3.2. 

(i) If n k rfi k ^ — > asn-> oo, then 

/3k(G(Xi, . . . , X n )) — > a.a.s. as n — » oo. 

(ii) Ifn k rti k ^ — > a G (0,oo) as n — > oo, i/ien 

drvC8fc(e(Xi,...,X n )),y)<cnrt, 

where Y is a Poisson random variable with E\Y] = E[/3fc] and c is a constant 
depending only on d, k, and f . 

(iii) If n k rfi k ^ — > oo as n — > oo and nrj^ — > as n — > oo, t/ien 

/3(e(x l5 . . . ,x n )) - E[(3(e(x 1 , x n ))] 



^/v^(p(e(x 1 ,...,x n ))) 



N(0,1). 

The first step in proving Theorem |3.2| is to determine the order in n and r n of 
E[S'„ i fe] and E[S ntk + Y n ^ + Z n ,k\- In fact, slightly more is needed. Let A be an 
open subset of R d such that vol(dA) — 0. Let X be a finite subset of R d , and call 
x E X the "left-most" point of X (denoted LMP(X)) if x is the first element of 
X when X is ordered lexicographically. Now, define S n ^,A to be the number of 
empty (k — l)-simplices formed from X±, . . . t X n , such that the left-most point of 
the fc-simplex is in A. Define SVi.fc.A m the analogous way. 

Lemma 3.3. For k > 1, let 

f-A '■= ( / f(x) k dx) / hi(0,y 2 ,---,yk)d(y 2 ,---,yk)- 

\Ja / J(R d ) k ~ 1 

Then 

lim n- k r- d ^E[S n , kiA }= Hm n- k r - d ^E[S„ 



i _ f-A 
,k,A\ "ftp 



Observe that depends only on / and ^4 and can be trivially bounded by 



v) (2 d 6 d ) k , where 9 d is the volume of the unit ball in R . 



Ife-l/orffl \k-l 

Lemma 3.4. let 



y-' ■= / f(x) k+1 dx) gl' 2 {0,yi,...,y k )dyi---dyk, 

\jR d J J(R d ) k 

where g\ ,2 {xo 1 . . . ,x k ) is the indicator that {xq, . . . ,2^-2} form a simplex (where 
a complex is built as described on Xo,...,x k with threshhold radius 1) and that 
{xo,Xk-i} and {x\,Xk} are edges. let 

»"■■=([ f(x) k+1 dx) [ kl(0,y 1 ,...,y k )dy 1 ---dy k . 

\JR d J J(R d ) k 

Let k\(xQ, . . . , X k ) be the indicator that {xq, . . . , 2^-2} form a simplex and that 
{xQ,x k -i} and{xk-i,x k } are edges. Then 

lim n-^r- dk E[Y n , k ] = — ^— , 

n-s-oo 2{k — 6)\ 
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Urn n-( fc+1 V~ dfc E[Z n , fe ] 



Corollary 3.5. For S nk ,Y n>k , %n,k as above. 

E[S n ,k + Yn,k + Zn,k\ — E[5 n> fe]. 

The proofs of these facts are identical to the proofs of the corresponsing facts 
for subgraph counts of random geometric graphs given in Chapter 3 of |15| . 



This last corollary is already enough to prove part 
as n — > oo, then 



of Theorem 



3.2 



ifn fc 7n 1 



IP [^(6(^1,...,^) > 1] <E[/3 k (e(X u ...,X n )] <E[S n , k + Y n , k + Z rhk ] 0. 



In order to prove part (ii) the following abstract approximation theorem of 
Arratia, Goldstein, and Gordon is needed. 

Theorem 3.6 ( lj). Let £ I) be a finite collection of Bernoulli random 

variables with dependency graph (J,~). Let pi : = E[£j] and pij := E[£j£j]. Let 

* '■= YsidlPii and let w : = Z)ie/&' Then 



( 



drv(W,Poi(X)) < min^A" 1 ) 



Penrose [TS] used this theorem to prove Poisson approximation results for sub- 
graph counts of random geometric graphs; one can follow this approach essentially 
without change to prove the following result, which holds in the entire sparse regime. 

Theorem 3.7. With definitions as above, 

d TV (S n , k ,Poi(E[S nik })) < c k ,d,f[nr%\, 
for a constant Cd, k j depending only on d, k, and ||/||oo- 
Corollary 3.8. If n k r^ k l ' — >• a e (0, oo) as n —> oo, then 

d T v(Sn, k ,Poi(E[Sn :k })) < c d , k ja(nr^). 



That is, in the regime of part (ii) of the theorem, the lower bound for fi k given 
in ([5| is approximately Poisson. 

Proof. Note that S n ^ k — S n . k is the number of empty (k — l)-simplices among 
{X . . . , X n } which are not isolated, and is thus bounded above by the number 
of connected subsets of {AT . . . , X n } with k + 1 points, k of which form an empty 
fc-simplex. The expected number of such sets is bounded by 

[ k l 1 )fc||/||^ 1 ^ +1 (2r„)^- 1 »(4r„)^ ( W^W k+1) \ 
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so that 

drv(S„, k , S B ,fc) = \P[S n , k € A] - P[S n , k £ A}\ 

= \P[S n ,k S A, S n<k ^ S ni k] — P[S n< k S A, S Uj k 7^ Sn,k]\ 

< c^jn^C 

< Cd,kjanr„. 

Moreover, it is easy to see in general that if Y a and Yp have Poisson distributions 
with means a and /3, respectively, then drv(Y a , Yp) < |a — /3\, and so 

d T v{Poi(E[S n . k }),Poi(E[S n M})) < c d ,k.jomr d n 

as well. 

□ 



The following result, proved below using Theorem |3.6[ holds throughout the 
sparse regime. 

Theorem 3.9. There is a constant c d ,kj depending on d, k, and f only, so that 
with S n k,Y n u,Z n u as above, 



2TV 



(S n ,k + YnM + Z n ^k,Poi(E[S nik })) < c d ,kjnr d . 



The inequalities in ^ together with Corollary 3.8 and Theorem 3.9 yield part 



ii) almost immediately. 



Proof of part (ii) of Theorem 3.2 By the left-hand inequality in §5§ and Corollary 



P[/3 fe _i < m) < P[S n<k <m}< P[Y <m}+ c d . kJ nrt 

where Y is a Poisson random variable with mean EfS^fc]. 
By the right-hand inequality in (|5| and Theorem |3.9| 

P\Pk-i <m]> P[S n>k + Y n . h + Z n . k < m] > P[Y < m] - c dM nri. 

As in the previous proof, Y can be replaced by a Poisson random variable with 
mean E[/3fc(C(Xi, . . . , X n ))] with only a change of constant in the error term. □ 

Proof of Theorem \3.9\ For notational convenience, let W„ t k '■= <SVi,fc + Y rh k + Z n ^. 
For 1 < p < q < k — 1, let g^{x\, . . . , Xk+i) be the indicator that {xi, . . . , Xk-i} 
form a simplex (where a complex is built as described onii,..., Xk+i with thresh- 
hold radius r n ) and that {x p , Xk} and {x q , Xk+i} are edges. Let (x%, . . . , Xk+i) be 
the indicator that {x\, . . . ,Xk-i} form a simplex and that {x pi Xk} and {xk, £fc+i} 
are edges. For j = (j x , . . . ,j k +i), let if' 9 = 9r£{ X h> ■ • • > x jh+i) and let V) = 
kP n (X h ,...,X jk+1 ).Then 

Wn, k = E & + E E Tf 9 

l<ii<---<ifc<n l<ii<-<jfc-i<" l<p<<j<fc— 1 

3k,jk+l&{ju—,3k-l} 



E E tf- 



i<n<— <i*-i<« i<p<fc-i 
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The proof that W nj k has an approximate Poisson distribution proceeds along the 
same lines as the proof given by Penrose for subgraph counts. For the Bernoulli 
random variables in the sum above, one can take a dependency graph to be i ^ j 
if i n j 7^ 0. (Abusing notation, i is also used here to denote the set of indices from 
the multiindex i.) Note that it is not important that i and j be the same size. 

Now, E[£ ; ] < [(2r„)^ rf ||/|| 00 ] fc - 1 and if |ini'| = t, then 

< [(2r„) rf ^||/|| 00 ] 2fe -^ 1 , 

since if set of k points forms a simplex, they must all be in the ball of radius 2r„ 
about the first point. Given i = (ij, . . . , the number of i' = (i^, . . . ,i k ) with 
i ~ i' (including i itself) is 

+ 0(n k - 2 ) ; 



y kJ \ k J k\ 
for i as above, the number of i = . . . , i' k ) with in i' = I is 
k\ I n — k\ fk\ 1 



n k -' + O (n*-*- 1 ) 



jj\k-ej \ej(k-ey. 

This means that the contribution to the error term (without the min(3, A -1 ) factor 
in front) from Theorem 3.6 of the form pipi for i ~ i' is, to top-order in n, 

fc ^),[( 2 ^)^ll/l|oo] 2,! - 2 ! 

and the contribution from terms of the form pn' is (to top order) 



fc-i 



Similar to above, Efyf 9 ] < 2 d [(2r n ) d 9 d \\f\\ co ] k and if |j D j'| = I, then 

nt q ^' q '] < 2 3d [(2r n )^||/|U] 2fe+1 -'. 
Given j = . . .,j k +i), the number of j' = . . . ,j' k+1 ) with j ~ j' is 

(fc + 1)! + 1 ' 
and the number of j' with |j n j'| = £ is 

'k + 1 



£ J(k + l-£)\ 



0{n 



k-t\ 



This yields a top-order contribution to the error from Theorem 3.6 from the E[7j]E[7j/ 
and E[7j7j/] terms of order 

(fc + 1)2 " 2fe+1 (k ~ l ^\ 2d [{2r n ) d 6 d \\f\U] 2k 



[(fe + 1)!] 2 V 2 



< n k+i r dk_ 
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In the same way, E[rj?} < 2 d [{2r n ) d e d \\f\\ 00 ] k , and if |j n j'| = i, then 

E[^']<2 3d [(2r n ) d rf ||/|U] 2k+1 ~\ 

thus the contribution from the terms of the form E^E^y] and of the form ¥\^rjy 
is of the same order as the contribution above from the corresponding 7 terms. 
The cross terms are essentially the same: if | i f ) j ] = £, then 

mt q ] ^ 2M [(2^)%||/||oo] 2fc "' E&tf] < 2 3d [(2r n )«0 d ||/IU] 2k ~ l 

E[ 7 rX]<2 4d [(2r n )^||/|| 00 ] 2fc+1 -". 
The number of j = (ji, . . . , j k+1 ) with i ~ j is 



n \ I n — k\ n k 



+ 0(n k - 1 ) 



Jc + lJ \k + lj (k-l)\ 
and the number of such j with |i fl j| = t is 

n-k \ _ fk\ n k+1 - £ 
K k + l-e) ~ \£J (k + 1 - 1)\ 
This yields a contribution from the £-7 cross-terms of 



" \ v-* (k — l\ (k\ n k+1 e ^ A r ,„ „ „„ -,2k-t 

00 



< „ k+1 r dk 



The contribution from the £-rj cross terms is the same up to constants depending 
only on k and d, and the contribution from the 7-77 cross terms is 



[(fc + 1)!] 2 



-(fe-i)^^* 1 ^)^!!/!! 

— /fc— l\/fc + l\ n fe+1 ~^ r rf ... n 2fc+i-£ 



«=0 



Collecting terms and using that A = EfW 7 ,^] ~ n fe r„ ' Theorem 



3.6 



yields 

d TV (W,Poi{\)) < c dM nr d n . 
Again, one can replace A with E[«SVi.fc] with only a loss in the value of the constant 

Cd,kJ- 

a 



The remainder of the section is devoted to the proof of part (iii) of Theorem 3.2 
A central limit theorem for the recentered, renormalized upper bound of j3k given in 
([5]) follows immediately from Theorem 3.9 in this range of r n , by the classical result 
that a Poisson random variable with mean tending to infinity tends to a Gaussian 
random variable when recentered and renormalized. 
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Theorem 3.10. 



If nr d - 

Sn,k ~\~ +n .k 



> and n r. 



fc d(fe-l) oo 



Z n ,k — ^\S n ,k] 



E[S„ 



then 



N(0, 1) 



as n tends to infinity. 



Clearly the approach to the lower bound of ^ taken in the regime in which 



n k r d n (k - 1} 



a G (0, oo) also works in the case that n 



tends to infinity but 



tends to zero to show that S n ^ is approximately Gaussian in that regime 
as well. However, to deal with the regime in which r n = o(ri~ 1 / d ) but n k+1 r dk is 
bounded away from zero, a different argument is needed for the lower bound of 
Following Penrose, the approach taken here is to consider the Poissonized version of 
the problem (the vertices distributed as a Poisson process of intensity nf(-) instead 
of i.i.d. with density /), and then to recover the i.i.d. case. 

Let N n be a Poisson random variable with mean n, and let CP„ = {Xj, . . . , X^ n }, 
where {Xi}°°^ 1 is an i.i.d. sequence of random points in R d with density /. Then 
T n is a Poisson process with intensity nf(-), and one can define k and S^ k 
for the random points CP„ analogously to the earlier definitions. In what follows, 
assume that k > 3; that is, the empty {k— l)-simplices are at least empty triangles. 
Empty 1-simplices are simply pairs of vertices which are not connected, and different 
arguments are needed in that case. 

In order to compute expectations for the expressions which arise in the Pois- 
sonized case, the following results are useful. 

Theorem 3.11 (See |15j). Let A > and let fx be a Poisson process with intensity 
A/(-). Let j G N. and suppose that hQd,X) is a bounded measurable function on 
pairs (y, X) with X a finite subset of R d and y C X, such that ft(y,X) — unless 
|y | = j. Then 



E 



E w»?a) 



= ^Eft(i;,i;uf A ), 



where X'j is a set of j i.i.d. points in R d with density f, independent ofP\. 
From this, one can prove the following. 

Theorem 3.12. Let A > and k,ji,...,jk G N; define j := X^=iii- F° r 1 — 
i < k, suppose hi$, X) is a bounded measurable function of pairs (y,X) of finite 
subsets of R d with y C X, such that /ij(y, X) = if |y| 7^ ji. Then 



E 



E 




in»j=0 for i^j} 



E 



n 



\3i 

ji ■ 



where X'j are j i.i.d points in R d with density /, V\ is a Poisson process with 
intensity A/(-), and X^- and are independent. 

Proof. Consider the case k = 2 for simplicity (the case of larger k is the same with 
more notation). Define /i(y, X) on subsets y of X of size f\ + j'2 by 



E 



/ii(yi,x)/ l2 (y\y 1 ,x). 
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Then by Theorem |3TT 
E 



J2 ^l(yi,J , n)/i2(y 2 ,y„)l{y iny 2 =0} 



E M^n) 

Efe(x;,x;uy n ) 

U1+J2)! 

^^E [hiix'^x'j u a>„)M^ \ Xj t ,Xj U ?„)] 



□ 



One can apply these results to compute the mean and variance of S^ k A , the 
number of isolated empty fc-simplices in 7 n whose left-most vertex is in the set A. 
Recall that A is assumed to be open with vol(cM) = 0. 



Lemma 3.13. For ha as in Lemma\373\ 



lim n- k r- d(k - 1 '>E 



= lim n- k r- d{k - 1) y&T 



Ma 
jfe! ' 



Proof. Let h rn} A({%i> ■ ■ ■ 1 %k}, X) be the indicator that {x\, ■ ■ . , Xk} C X form an 
isolated empty (k — l)-simplcx in X, whose left-most point is in A. Then 



(6) ns p n ^ A 

Now, E 



E 



fe! 



E 



^t„,a(Xj,, X fc U IP n ) 



h rn> A(X' k ,X' k U9 n )] < E[hr ntA (X' k )] ~ r» (fc 1} ma- Note that the con- 
ditional probability that X' fc is isolated from 7 n given that X' k forms an empty 
(k — l)-simplex with left-most vertex in A is bounded below by the probability that 
there are no points of CP„ in the ball of radius 4r„ about X±, which is given by 
g-nvol/CBirnpfi)) > e -n||/|| 00 e li (4r„) ,i ) since <p n jg a p i sson process with intensity 

nf(-). It thus follows that 



ftr n ,A(3C fc) X' fe U 3> n ) > e- n ^ e ^ d E[h rri .A(X' k \ 



p -n\\f\USd(4r n ) d d(k-l) 

e 1 n MA- 



Since nr„ — > 0, this shows that 



1, (i(fe-l) 

fc! 



A similar approach is taken to compute the variance: 



(Sn,k,AY 



^r„,A 



fe-i 



j=oy,ycp n 
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The first summand has already been analyzed: E Sj lj4 ~ — ^ — — . For the 
second, observe first that the terms corresponding to j ^ vanish: 
h rn ,A(y,Vn)hr n ,A(y',y n ) = if |^ H V'| = j, because if y and ^ bot h form empty 
fc-simplices, then neither is isolated. When j = 0, applying Theorem 3. 12| yields 



E 



(fc!) 2 

and thus (making use of (|6 



Var 



J n.fc,A 



E 



S 



n.k,A 



l r„,A 



,2k 



(fc!) 2 



(X fc , X 2fc U J' rl )/l r?iiJ 4(X2 fe \ X' k , X' 2k U IP n ) 



' i r„, J 4(x fe , X 2fe u y n )h rni ^(x 2k \ x fe ,x 2fc u y„) 

2 



E 



^r„,A(Xj., X^, U T n ) 



Now, let 3>^ be an independent copy of 3>„. For notational convenience, denote 
X' 2k \ X' k by yj. and abbreviate h rn ,A by ft- Then 

2 



E 



h(x' k , x' 2k u y n )h(y' k ,x' 2k u y n ) 



E 



h(X k , X' k U 



ft(x' fe , x' 2k u T„)ft(y' fe , x' 2fc u y n ) - h(x' k , x' k u y n )M%X u ?'J 
^ft(X' fe , x 2fe u 3 5 n ) — ft(x' fc , x^. u JVJJ ft(y'fe, x 2fe u y n 

E [ft(X' fe , x^ u y n ) (h(%, X' 2k U ?„) - h(%,% U 0>„) 

e h(x' k , x' k u y„) f h{%, % u ? n ) - u y; 



— ~t~ ^2 ~t~ -^3 • 

Now, observe that in fact E\ = 0: the difference is non-zero if and only if X^ and 
y' fc are connected by an edge, in which case the second factor is zero. 

Observe that the difference in _E 2 is non-positive. Furthermore, it is non-zero if 
and only if X' k and ^' k are connected by an edge, and both X' k and ^' k form empty 
fc-simplices. This probability is bounded above by \\f\\'^~ 1 d 2 d k ~ 1 (2r n ) 2d ^- 1 \8r n ) d . 

Finally, if [u, fe =lJ B 2r „ {X[)] n [u 2 ^ fc+1 B 2rii (X^)] = 0, then the two terms of E 3 have 
the same distribution by the spacial independence property of the Poisson process. 
A contribution from E 3 therefore only arises if in particular \Xi — Xj\ < 2r n for 
each 2 < j < k, if \X k+1 - Xj\ < 2r n for k + 2 < j < 2k, and \X t - X k+1 \ < 8r„. 



The probability of this event is bounded above by WfW^Of " 1 (2r„) 2d( ^ 1) (8r-„) <i . 
It follows that 

and 



Var 



°n,k,A 



D n,k,A 



+ E, 



r ,2fc/9 \2dk-d 

\E\ < ^^ 2 2||/|| 2 fe - 1 ^- 1 4^C(/,d,fc)(^) fe (^r^- 1 )), 

where C{f,k,d) is a constant depending on /, d, and k. This completes the proof. 

□ 
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The following abstract normal approximation theorem is another version of the 
dependency graph approach to Stein's method. It is used in what follows to prove 
a central limit theorem for S^ k . 

Theorem 3.14 (Penrose). Suppose is a finite collection of random variables 

with dependency graph (I, ~) with maximum degree D — 1, with E[£j] = for each 
i. Set W := ^2 ieI £,ij suppose E[W 2 } = 1. Let Z be a standard normal random 
variable. Then for all t € R, 

\p[w <t]-p[z<t]\< ^/z^Ei^ + eJz^Ei&l 4 - 



iei 



Making use of this result, we prove the following. 
Theorem 3.15. With notation as above, and for n k rn 



iei 



oo and nrz — > 0, 



E 



S ,, 



Var 



OP 



K(0,1). 



Proof. To define a dependency graph for the summands of S„k> the independence 
properties of the Poisson process are exploited. Let {Qi : n}ieM be a partition of R d 
into cubes of side length r„. For the moment, assume that A is a bounded set, and 
let I a be the set of indices i such that diam(A n Qi, n ) > 2r n . Write 



(7) 



iei a ycy„ 



Observe that if one defines a relation ~ on I a by i ~ j if and only if the Euclidean 
distance from Qi^ n to Qj jU is less than 8r„, then (7a,~) is a dependency graph 
for the summands in ([7]). The degree of vertices in this dependency graph is then 
bounded by 17 d . 

hr n ,Ar\Qi, n ^^n)\ to apply Theorem 



3.14 



bounds are needed 



Let & :— X)ycy„ 

for E|£j — E£,i\ p for p = 3, 4, for which it suffices to have bounds on E|£| p for p = 3, 4. 
Observe that if Zi is the number of points within 2r„ of then Z^ n is distributed 
as a Poisson random variable with mean rc.vol/((Qj )n .)2r n )j and 

161 < (Z<)(Zi - 1) • • • {Zi - k + 1) =: (ZOk. 

It follows that there is a constant c depending only on <i and /, such that for 
Pn -=nr d , 



E|fc|* < E(Z t -)? < £ MS' 



-cp 



for some new constant c' depending only on d, /, and fc. 

Note that since A is bounded, \Ia \ is a t worst of the order 
depending on A. Applying Theorem 



3.14 



to 



r n , with coefficient 



V / Var(S„ 



gives 



fc,A 



< i 



which tends to zero as n tends to infinity. 
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To move to A = R d , let ( n , k (A) := S "- k \ A =^ and consider A K := (-K, K) d 

V n k r n 

and A K := R d \ [-K, K] d . Given t e R and e > 0, 

P[C«,fe(R d ) < t] = P[Cn,fc(i4jf) < * - C] - P[{Cn,k(^x) < * - £} {Cn,fe(R d ) > *}] 

+ P[{|Cn,fc(^) - 1\ < e} n {C„,fe(R d ) < t}] 
+ P[{C«,fe(^) > t + e} n {C«,fc(R d ) < *}]• 

Now, C™,fe(R d ) = (n,k(A K ) + CnA AK ) almost surely since vol(A° K U (A K ) C ) = 0, so 

|P[Cn,fc(R d ) < t] - R[Q n ,k{A K ) < t - e]\ < P[\( riik (A K )\ >e}+ P[|C n ,fe(A* ) - 1| < e]. 

By Chebychev's inequality and the central limit theorem already established for 
bounded sets, this last expression is bounded above by 



1 



Var(C„, fc (A K )) + P 



d(fe-l) 



< e 



1 „ 2eJn k ri ik - 1) 

< ^Var(C„, fe (A K )) + V 



1 



2eVfc! 



(nVf- 1 ))- 1 ' 4 



e 2 fc! y/2TT^A K 

for a constant Ck depending on K. Taking n to infinity for K and e fixed yields 

limsup |P[C„. fe (R d ) < t] - P[Cn.fe(^K) < * - e]| < ^^TT + "S^U 
which, together with the central limit theorem for C n ,k(Aic), implies that 



lim sup 

Now, 



P[Cn,fc(R d ) < t] - P 



h d(fe-l) 



-Z<t-e 



< 



1 /i^K 



2eVfc! 



e 2 fc! v / 27t/j Ak 



n K r n 



= $ 



/ I rf fc ^ 
U Var(S£ fciAjf ) 



(t-e) 



/ fc! 



that is, 



lim sup 

n— f oo 



P[Cn,fc(R d )<i]- < 5 



/ fc! 



(t-e) 



1 ijiK 2eVfc! 



if 



Recall that lim^-j-oo ^A K = H and limx-s-oo \i A k = 0. Thus for n and if large 
enough, 



2eVfc! 
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Since $ (\f^{t — e)J e ~^°> $ (y^) an< ^ e was arrj itrary, this finally shows that 



lim 



P[< fc <f]-*| x l-t 



= 0. 



□ 



The remaining work is to use this result to obtain the same result for S n k itself. 
To do so, the following "de-Poissonization result" is used. 

Theorem 3.16 (See |15j). Suppose that for each n 6 N, H n (X) is a real-valued 
functional on finite sets ICR 11 . Suppose that for some a 2 > 0, 

(i) -Var(i?„(T„)) a 2 , and 
n 

(ii) — = [H n (y n ) — Ei? n (T„)] =$> cr 2 Z, for Z a standard normal random vari- 
able. 

Suppose that there are constants a € R and 7 > 5 such that the increments R m ,n = 
H n {X m+ i) - H n (X m ) satisfy 



(8) 

(9) 

and 
(10) 



lim sup |E[_R m .„] - a| = 0, 

n->-oo \ ra _ n T< m <n+n7 



lim ( SUP mRm,nRm',n] ~ OL I ) = 0, 

n->oo \ n _ Tl T<m<m'<n+n'T 



lim I —= sup Efi? 

n->oo \y/n n —ni<m<n+n"t 



2 - 



Finally, assume that there is a constant f3 > smc/i £/ia£, traf/i probability one, 

\H n (X m )\ < [3(n + mf . 
Then a 2 < a 2 and as n — > oo, ^-Var(i?„(X n )) — > a 2 — a 2 and 

-^=[H n (X n ) - EH n (X n )] =► ^^Z. 

In conjunction with Theorem |3.15[ this yields the following. 



Theorem 3.17. With notation as above, and for n r 



kd(k-l) 



oo and nrt — > 0, 



S n ,, 



E 



S n .h 







Var 


S n ,k 



3ST(0,1). 



Proof. Theorem |3.16| is applied to the functional 



H n (X) 



3.15 



c 2 = jt and the central limit theorem holds for H n {y n ) by Theorem 

Let Z? m ,„ := Sycx m+1 ^(y^m+i) ~ 2ycx m ^n. (V, X TO ), and observe that 
D m „ is the number of isolated empty (k — l)-simplices in X m+1 with X m+ i as a 
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vertex, minus the number of empty {k — l)-simplices in X m which are isolated in 
X m but connected to X m+ i. Thus 



(11) E[D m>n ] 



m 
k - 1 



E[/l r „ (Xfc, X m _|_l)] 



-WEf^^yp [X m+1 e U^B^Xi)] 



It is clear that 

(1 - ||/IL0d(4r n )*r +1 - fc r#*- 1 V < EK„(l fc ,I m+1 )] < rf^V, 

with the upper bound arising from removing the condition that X& be a component 
in C(X m _|_i) and the lower bound arising by bounding below the conditional proba- 
bility that Xfc is a component, given that it forms an empty (k— l)-simplex. If 7 < 1, 
then lim„_ >00 (l — ||/||oo#d(4r„) d ) m+1_ft = 1, uniformly in to € [re — n 7 ,n + n 7 ], thus 
E[/i rn (Xj., X m+1 )] ~ r„ // uniformly in m £ [re — ?i 7 ,n + re 7 ], and the same is 
true for E[/i r?i (Xfc, X m )]. 



For the second term of (11), observe that 
(?) 



(*-x) 



P [X m+1 e U* =1 B 2 r B (-X'i)] < T \\f\\M*r n ) d , 



and limn-^oo mr„ = 0, uniformly in to 6 [re — re 7 , n + re 1 ']. That is, the second term 
is of strictly smaller order than the first. Thus 



lim sup 

n->oo n- n -t <m<n+n'i 



(nr*) x -*E[D m , n ] - 



1 



(fe-1) 



This implies that 



lim sup 

n->oo „_ n T< m < n + n T 



K) 



ds(l-fc)/2 



E[X> r , 



0. 



0. 



since nr d — > as re — > 00, and so the first increment condition of the theorem is 
satisfied with a = and any choice of 7 G (|, 1). 

Next, consider the quantity E[5 mi „D m ' iB ] for to < mf. Recall that 



|y|=fe-i 



|y|=fc 



First consider the contribution to E[D m ^ n D m i from terms of the form 

E[/i r „(y U {X m+1 }, X m+ i)h rn (y' U X m /+i)] 

for y,y' such that (y U {X m+1 }) n V = 0. By conditioning on the event K n {^ u 
{X rn+ i}, X m+ i) = 1, it follows that 

E[h rn (\) U {X m+1 },X m+1 )h rn (y'u{X m , +1 },X m , +1 )} ~ r^VC, 

where C is the conditional probability that y' U X m i + \ is a component in X m / + i, 
given that it forms an empty (k — l)-simplex, and that y U X m+ i forms an empty 
(k — l)-simplex which is not connected to any other points of X m +i. Note that if 
to = m' then £ = 0. Otherwise, simply bound £ < 1, so that these terms have 
asymptotic order bounded above by ryi fi 2 , uniformly in to. The number of 



such terms is bounded by 



P-l)'] 2 
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Note that if (yu{X m+1 })ny' ^ 0, and to ^ m', then h rn (yu{X m+1 },X m+ i)h rn (y'{J 
{X m i + i} 7 X TO ' + i) = 0. If m = m' and then it must be that y = y' to get a non-zero 
contribution. In this case, one gains a contribution to E[_D^ n ] of 



//> \ .„,._, . (n + n 7 ) fe l rn k '' 



/< 



^-iy n ^- (fc-i)! 

Moving on to the cross terms, if m' = m then 

h rn (y u {x m+1 }, x m+ i)/t rn (y', x m )i| Xm+ieU ^ 5j) B2rm (2/) | = o. 

If m < to' (or to > to'), then 

h Tn (y U {X m+ i}, X m+ i)/l r „ (y , ^m')l-.rx m / +1 eU gy' B2r n (?/)} 

< e p r „(y u{x m+ i},x m+ i)h r „(y',ac m o] ||/||oc^(4r„) d . 

Again, to get a non-zero contribution, it must be that (y U {X m+ i}) n V = 0. In 
this case, the expression above is bounded above by 

(r^*- 1 V) 2 ||/|| 00 (9 d (4r„) d . 

The number of such terms is bounded by ( fe m 1 ) (™) < ^k^k-iy. ■ 

For the product of the second sums from D m>n and D m i >n , we have already seen 
that the conditional probability that X m+ i g U y ey^2r„(j/) given y is bounded 
above by ||/||oo6 , d(4r„) d , and so if to = to', 



X] (^■»(y> X m') 1 {jr m , +1 6U ses B2r„(»)}) 

«^(4r„) d , 



(n + nT) fc 

fc! n M 



< 



'2^ 



(»)} 



)-a 



while if y ^ y, 

(^rJy,X m 0l{ Xm , +ieU ^B 2rn (y)}) (^^'' X ™') 1 {X m , + 1 eU Be;j - 

For m/m'jC X m and y C X TO /, let £ be the indicator that y forms an empty 
(fc — l)-simplex and 77 the indicator that it is a component in X m . Let £' and 77' 
be the corresponding indicators that y' is an empty (fc — l)-simplex and that it is 
a component in X m >. Let £ and (' be the indicators that X m+1 is connected to y 
and that X m i + \ is connected to y', respectively. Then what is needed is 

Note that for the product to be non-zero, it must be that (y U {X m+ {\) n y' = 0. 
Now, 



p[CC' = i|MV = i] < 



l/HL^(4r„) 



2d 



v ol/(n ye yB 2rri (y) c ) 



< 



l/HL^(4r r 



|/||oo^(4r„) d ' 



since if £r)£'r]' = 1, then y and y' make up empty (fc — l)-simplices; and morover, 
while nothing at all is known about X m > + i, it is known that X m+ i is not connected 



LIMIT THEOREMS FOR BETTI NUMBERS 



2:S 



to y'. Trivially, P [7777' = l|&' = l] < 1, and P[££' = 1] = P[£ = 1]P[£' = 1] 
r 2 n d{k ~ 1} n 2 , since y n V = 0. Thus 



E 



X! ^"^' Xm ) 1 {x m+1 eU Be;) s 2r „( a )}^(^' X ™')l{x m , +ieU!/g;) , B 2rn (y)} 



jci m ycx„ 

< Cd,/(m-n) 2 V 
(fc!) 2 

It now follows that E[-D m ,n-Dm',ri] ^ Cd.f,k(nr^) k for all m,m' G [n — n 7 ,n + n 7 ] 
with m ^ m' , and so 

lim sup (nr^) 1 - fc E[D TO)n D m / in ] = 0. 

n->oo f i_ n T< m <m'<n+n'T 

If m = to', then E[D^J < c d j(nr^) k ^ 1 , and so 

lim sup -1= ( nr ») '"^^J = °" 

n ^°° n-ni <m<n+ni \ n 

Thus the increment conditions of the theorem are satisfied with a = 0. 
Finally, observe that 

, T , %AtTO (V^ + to) 2 

fln ^ m ' - I d(fe-i), - t d(k-i), ' 
n K r n K n K r n k 

since n k r^ k : ' is assu med t o go to infinity as n — >■ 00, the polynomial boundedness 

is satisfied and the central limit theorem for S n .k is 



3.16 



condition of Theorem 
proved. 

□ 



As was previously noted, that the same central limit theorem holds for upper 
and lower bounds for /3k given in ([5| immediately yields part (iii) of Theorem 3.2 



Theorem 3.18. 



Pk-i - E[5„, fc ] 



01(0,1) 



4. VlETORIS-RlPS COMPLEXES 

Vietoris-Rips complexes were introduced by Leopold Vietoris in the context of 
algebraic topology, and independently by Eliyahu Rips in the context of geometric 
group theory. These complexes continue to be a useful construction in both fields, 
and are also useful in computational topology - although they do not carry the 
same homotopy information that the Cech complex does, the fact that they are 
determined by their underlying graph makes them much smaller in memory and 
more amenable to certain kinds of calculation. 

Let / : R d — > R-° be a bounded measurable density function and et X n denote 
a set of n points drawn independently from this distribution. For any r > define 
a (random geometric) graph G(n,r) on X„ by inserting an edge {x,y} whenever 
d(x,y) < 2r. Usually r = r(n) and we consider the limit as n tends to infinity. 
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Random geomclrii; complex on n=I00 vertices 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Scale ( f) 



Figure 2. The Betti numbers of VR(n,r) plotted vertically 
against r horizontally; n = 100. Computation and graphic courtesy 
of Afra Zomorodian. 



The random Vietoris-Rips complex VR(n, r) is the clique complex of this random 
geometric graph; that is, the maximal simplicial complex with 1-skeleton G(n,r). 
To see the contrast with X(n,p), Figure [1] has a picture of the Betti numbers of 
a random Rips complex VR(n, r) on 100 uniform points in a 6-dimensional cube, 
with n = 100 and < r < 1; compare with Figure [T] 

In the sparse range of parameter, r = o(n~ 1 ' d '), a formula for the asymptotic 
expectation of /3& was given in [9]. 

Theorem 4.1. For d > 2, k > 1, e > 0, and r n = OirT 1 !^*-), the expectation 
of the kth Betti number of the random Vietoris-Rips complex VR(X n ;r n ) 

satisfies 



n 2fe+2/(2fc+l) 



as n 4 oo, where Ck is a constant that depends only on k and the underlying 
density function f . 

In the same regime we prove limit theorems for 

Theorem 4.2. With the same hypothesis as in Theorem \4-.l\ 

(i) if n 2k+2 rn 2k+1 ' > — > as n — » oo ; then 

Pk{VR{X n -r n ))^{) a.a.s- 

(ii) if n 2k+2 rn 2k+1 ' > — > a G (0,oo) as n — > oo, then 

d TV (Pk(VR{X n -r n )),Y) < canrt 

where Y is a Poisson random variable with E[Y"] = E[/3^] and c is a constant 
depending only on d, k, and f; 
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(iii) if n 2fc+2 r^ 2 ' £+1 - ) — y oo, then 

Pk ~ E[/3 fc ] 



7f(P, 1) 



(The case k = is handled in detail by Penrose [15].) 

The main idea of the proof of Theorem |4.2| is again to bound /?& between two 
random variables which satisfy the same central limit theorem. The intuition behind 
the bounds is that almost all of the homology of VR(n, r) is contributed from a 
single source: the octahedral components. This is essentially because they are the 
smallest possible support of homology (smallest in the sense of vertex support) , in 
the same way that empty (k — l)-simplices were the smallest possible support of 
homology in the previous section. 

Definition 4.3. The (k+ l)-dimensional cross-polytope is defined to be the convex 
hull of the 2fc + 2 points {ie^}, where ei, e2, . . . , ek+i are the standard basis vectors 
of R fe+1 . The boundary of this polytope is a fc-dimensional simplicial complex, 
denoted Ok- 

Simplicial complexes which arise as clique complexes of graphs are sometimes 
called flag complexes. A useful fact in combinatorial topology is the following; for 
a proof see [TT] . 

Lemma 4.4. If A is a flag complex, then any nontrivial element of k- dimensional 
homology iJ,t(A) is supported on a subcomplex S with at least 2fc + 2 vertices. 
Moreover, if S has exactly 2k + 2 vertices, then S is isomorphic to Ok ■ 

Definition 4.5. Let Ofc(A) (or Ofc if context is clear) denote the number of induced 
subgraphs of A combinatorially isomorphic to the 1-skeleton of the cross-polytope 
Ofc, and let 5fc(A) denote the number of components of A combinatorially isomor- 
phic to the 1-skeleton of the cross-polytope 

Definition 4.6. Let /^r l (A) denote the number of fc-dimensional faces on con- 
nected components containing with exactly i vertices. Similarly, let /^p(A) denote 
the number of fc-dimensional faces on connected components containing at least i 
vertices. 

In |15j , Penrose proved the following limit theorems for subgraph counts of ran- 
dom geometric graphs. 

Theorem 4.7 (Penrose). Let ri,...,r m be graphs on v > 2 vertices, such that 
P[G(v,r) = Tj] > for each j. Let G n (T) denote the number of induced subgraphs 



of G(n,r n ) isomorphic to T. Then with r n as in the statement of Theorem 

(i) There is a constant jij depending only on Tj and v such that 

lim r- d ^n- v E[G n (T j )]= t i j . 

n— > oo 

(ii) Let Zi, . . . , Z m be indpendent Poisson random variables with EZj = E[G n (Tj)] . 
There is a constant c depending only on m such that 



drv [(G„(ri), . . . , G„(r m )), (Z 1 , . . . , Z m )] < cn v+1 



(iii) Suppose that n v rf^° ^ —¥ oo as n oo. Let t = \J n v rf^ v . Then the 
joint distribution of the random variables {G n (Tj)}JL 1 converges to a cen- 
tered Gaussian distribution with covariance matrix S = diag(fii, . . . ,/x TO ), 



for fij as in part (i) 
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Figure 3. The case k — 2: the seventeen isomorphism types of 
subgraphs which arise when extending a 3-clique to a connected 
graph on 7 vertices with 7 edges. Each subgraph isomorphic to one 
of these can contribute at most 1 to the sum bounding the error 
term /^ 7 . 



A dimension bound paired with Lemma 4.4 yields 

(12) ~o k <f3 k <5 k + fP k+3 , 

in analogy to the Morse inequalities used in the first section. 

One could work with fjr 2k+3 directly, but it turns out to be sufficient to over- 
estimate /^ 2fc+3 as follows. For each /c-dimensional face, consider the underlying 
(k + l)-clique; if it is in a component with at least 2k + 3 vertices, extend the clique 
to a connected subgraph with exactly 2k + 3 vertices and ( J ) + k + 2 edges, by 
the following algorithm. 

(i) Set G to be the 1-skeleton of the complex, and initialize H to be the 
(k + l)-clique. 

(ii) Find some edge connecting V(H) to V(G) — V(H). Add this edge (and its 
endpoint) to H. This is always possible since by assumption H is contained 
in a component with at least 2k + 3 vertices. 

(iii) Repeat step 2 until H has exactly 2k + 3 vertices. 

For example, let k = 2; then 

02 < h < o 2 + / 2 - 7 - 

Up to isomorphism, the seventeen graphs that arise when extending a 2-dimensional 
face (i.e. a 3-clique) to a minimal connected graph on 7 vertices are exhibited in 
Figure [3] 

In particular,/^ 7 < 5Zi=i s i; where Sj counts the number of subgraphs isomor- 
phic to graph i for some indexing of the seventeen graphs in Figure [3j 

In general, one can express the number of graphs on 2k + 3 vertices that can 
arise from the algorithm above as a function of k. Moreover, as is noted in [T5] . 
the number of occurances of a given graph T on v vertices (that is, the subgraph 
count corresponding to T) can be written as a linear combination of the induced 
subgraph counts for those graphs on v vertices which have T as a subgraph. That 
is, 



(13) 
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, j , T , 

J 1 

Figure 4. The case k = 1: the three isomorphism types of trees 
on five vertices. Each subgraph isomorphic to one of these can 
contribute at most 4 to the sum bounding the error term /-p 5 . 



where <?2fc+3 is a linear combination of the induced subgraph counts of graphs on 
2k + 3 vertices, the number of which depends only on k, and the trivial bound 
5k < Ofc has been used on the right-hand side. 



The induced subgraph counts appearing on the right-hand side of ( 13 ) are among 



the components of a random vector whose joint distribution is identified in Theorem 



4.7 (for two different values of v), and thus limiting distributions for Ok and g2k+3 
are known in those regimes. Moreover, it is easy to modify Penrose's proofs (just 
as in the previous section) to show that 



drv{ok + 52/C+3, Y) < canr. 



where Y is a Poisson random variable with E[Y] = Efo^ +.92^+3], which in particular 
yields a central limit theorem if n 2k+2 r^ 2k+1 ^ — > 00 as n — > 00. 



To obtain the limiting distribution for the lower bound of (13) is also just as in 
the previous section; all the proofs go through in exactly the same way, and will 
therefore not be repeated. 

For k = 1 there are several ways of extending a 2-clique (i.e. an edge) to a 
connected graph on 5 vertices and 4 edges. In this case the graph must be a tree, 
and it is no longer possible to recover the clique from the connected graph. However, 
there are only three isomorphism types of trees on five vertices, shown in Figure 
|4j^Counting these types of subgraphs may therefore result in an underestimate for 



ff because some edges might get extended to the same tree. However, each tree 
has only four edges, and so one can obtain the bound 

/r 5 <Hh + t 2 + t 3 ), 

where £1,^2, £3 count the number of subgraphs isomorphic to the three trees in 
Figure [I] The proof is then the same as in the case k > 2. 



5. Comments 

We studied here three different kinds of random simplicial complex in order to 
work as generally as possible; however there are various ways in which we believe 
it may be possible to extend our results. 

1. The random Vietoris-Rips and Cech complexes studied here are on Euclidean 
space, but this is mostly a matter of convenience. It would seem that the same 
proofs work, mutatis mutandis, for arbitrary Riemannian manifolds. This may be 
of interest in topological data analysis, as in earlier work of Niyogi, Smale, and 
Weinberger [14]. 
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2. It may be possible to extend the central limit theorems for the random Vietoris- 
Rips and Cech complexes into denser regimes, at least into the thermodynamic 
limit. We expect, for example, that there exists some c > such that CLT's hold 
for all Betti numbers simultaneously, whenever r > cnT x l d . 

3. An easier argument than those presented here should yield central limit theorems 
for Euler characteristic x 01 geometric random complexes, in the sparse range. 
Again it would be nice to know this this in denser regimes, and we would guess 
that it holds at least partway into the thermodynamic limit. 
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